This script provides the infrastructure to process the raw seabird data files into a usable format for the Bay Study.
The seabird instrument is a CTD (a file recording
Conductivity, Temperature, and
Depth data) represented as two types of data files, a
.cnv and a .hex file. Both can be opened in a
text editor and both contains data related to the cast. According to
Jillian, the seabird has proprietary software that converts the
.hex file into a .cnv file that contains the
tabular data. Therefore, the script will only focus on the
.cnv file going forward.
Data is read via the read.ctd.sbe function as part of
the oce package. This function is specific
only to a Seabird CTD file output. Alternatively, the
read.ctd function can be used if users are not sure what
instrument the CTD file (has overhead code to check which instrument the
CTD comes from). Although the .cnv file is a text file and
can be simply read as such, the main advantages of the
read.ctd.sbe file is two folds: 1) metadata is
automatically populated and 2) pressure is calculated from the depth,
condutivity, and temperature data. Since I have yet to explore the
relationships between pressure, depth, conductivity, and temperature to
confirm a linear relationship, it is simply easier to use the
read.ctd.sbe function to access the pressure data.
# Specify the month, year, and total number of casts
# A cast is defined as a deployment of the Seabird into the water to collect data. Theoretically, there is one cast per station.
# Function to read in the seabird data
# Make sure you're connected to VPN if this is on the U drive
read_seabird <- function(year, month,
path = "U:\\LTM\\Bay Study\\SeaBird\\Data") {
# Does the path exist?
if (!file.exists(path)) stop("Make sure the file path exists.", call. = F)
if (missing(year)) {
message("Pick a year:")
return(print(list.files(path)))
}
if (missing(month)) {
message("Pick a month:")
return(print(list.files(file.path(path, year))))
}
# List of files in the designated year and month
files <- list.files(file.path(path, year, month), pattern = "*.cnv", full.names = TRUE)
# Reading in all the files. Will return a list
lapply(files, function(x) {
# Pulling the cast name from the file name, which is the values between the month, year, and .cnv strings
castName <- gsub(paste0(".*/", year, "/", month, "/", month, year, "(\\d+)\\.cnv"), "\\1", x)
oce::read.ctd.sbe(x, station = castName)
}) %>%
setNames(gsub(".*/(.*)\\.cnv", "\\1", files))
}
# Actually read it in
seabirdCtdJan23 <- read_seabird(2023, "January")
# Do this across ALL the casts in that file
seabirdJan23 <- lapply(seabirdCtdJan23, function(x) {
slot(x, "data") %>%
data.frame() %>%
mutate(cast = x@metadata$station,
startTime = force_tz(x@metadata$startTime, tzone = "America/Los_Angeles"),
month = format(startTime, format = "%m"),
depthBin = cut(depth,
breaks = c(0, seq(0.7, ceiling(max(depth)) + 0.2, by = 0.5)),
labels = seq(0.5, ceiling(max(depth)), by = 0.5)))
})
The seabird is initially soaked at the top of the water surface
before being cast down into the water. The seabird logs data throughout
this period, but we would like to only grab data from when it is
descending the water column. We can use a breakpoint analysis to find
this segment during each cast. A breakpoint analysis involves fitting a
model that fits several linear models across the data; the connections
between these individual lines represents the breakpoints, from which we
can leverage to isolate the downcast. The model trains on the pressure
value and is given the freedom to iterate up to a max number of 5
breakpoints. This value, Kmax, can be changed but initial
analysis shows that 5 works well.
library(segmented)
# Fit the model, extract the breakpoints and various fit metrics
breakpointsSelect <- lapply(seabirdJan23,
function(x) {
x <<- x
lmMod <- lm(pressure ~ scan, data = x)
# Change Kmax if needed
fitSegmented <- tryCatch(selgmented(lmMod, type = "bic", Kmax = 5),
error = function(cond) {
warning(cond,
call. = F)
})
rsme <- sqrt(mean((x$pressure - fitSegmented$fitted.values)^2))
breakpoints <- fitSegmented$psi[, 2]
p <- function(y) {
plot(fitSegmented)
points(x[, c("scan", "pressure")])
abline(v = breakpoints, col = "#FF4D00")
}
# Return this
list(data = x,
mod = fitSegmented,
rsme = rsme,
breakpoints = breakpoints,
plot = function(){p(fitSegmented)})
}) %>%
setNames(names(seabirdJan23))
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1373.6898 1275.2490 192.2401 193.4804 -1027.4349 -1034.2194
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2332.7993 2109.5922 430.6863 433.3173 445.5128 -1293.3397
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1380.7901 1276.2965 227.5757 224.9449 -1064.4610 -1784.4546
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2166.5495 2009.5870 368.1030 -1146.7781 386.8056 -1348.4399
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1045.68994 964.55163 969.62478 -38.44593 -26.91763 -1261.87201
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2097.8573 1934.6604 231.8123 237.9149 -1331.0200 256.7610
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1529.9364 1419.1485 276.2517 272.2196 -1061.9336 -1060.5751
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2507.4056 2417.4226 281.4545 258.3791 -356.9328 -1066.2651
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1034.75201 955.18712 958.09052 62.40718 74.03101 -1017.84757
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1137.2726 1050.1356 295.8374 300.8899 -969.0046 -958.5499
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2493.9580 2233.9680 305.4928 -652.2075 -928.7491 -1165.7546
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2022.1152 1863.8197 189.2811 184.2769 -1220.9249 -1777.0192
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 730.1752 661.0903 -102.5939 -110.5507 -1300.6263 NA
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 750.862916 683.474222 682.466842 -15.000259 -1540.027803 7.744116
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 822.01979 755.14803 757.20423 -18.23934 -1250.45841 -1242.14328
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1649.0453 1512.1214 124.4223 121.4241 132.6003 -1437.2588
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1887.9475 1730.5567 193.9997 180.3350 -1269.1369 202.8412
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1944.7676 1778.6865 234.5148 228.4115 -1284.0124 -1279.2787
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2042.4269 1861.1223 556.6304 560.5748 -1434.1316 -1429.5565
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 414.66581 311.52462 -72.52492 -973.72246 -65.98031 -1424.49333
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 897.53806 828.42145 831.25690 34.19534 -230.87001 -508.28721
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## -97.37275 -464.02006 -720.20126 -1032.71643 -1026.94441 -1054.53568
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2137.6638 1908.7248 662.7625 -571.5509 -920.2182 -1065.1409
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 537.8641 467.9012 467.9433 -131.7067 -1246.7035 -1300.8363
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 981.8082 897.8037 194.5997 188.5088 -469.3392 -462.2944
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2482.9506 2228.0688 546.7809 -372.7311 -462.0744 -406.6882
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 679.7221 576.7691 164.6503 165.7044 -542.6680 -628.6545
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 740.2834 741.4615 683.2352 238.6401 166.0052 166.3922
## No. of selected breakpoints: 4
## breakpoint estimate(s): 10.07565 11.37635 270.8294 291.5222 293.9892
## Error : at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
## breakpoint estimate(s): 10.07565 11.37635 270.8294 291.5222 293.9892
## Error : at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 813.14003 739.19249 -33.54109 -47.89093 -38.00787 NA
## No. of selected breakpoints: 3
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2483.0472 2259.0862 441.7947 442.1874 -451.1717 -456.5763
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 888.73002 814.97564 816.91136 -33.50717 -1015.85128 -1004.57826
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 929.0426 844.5407 120.4342 -751.5495 -1134.2032 -1125.1059
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1985.8268 1789.3462 247.9881 240.6862 -816.6322 -804.9337
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 3749.5919 3240.5537 497.5725 -464.8167 -515.1256 -451.1259
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2179.2326 1985.4520 475.1805 -970.7560 -1175.2969 -1424.7996
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 7067.036 6790.879 4431.347 -12214.312 4463.171 4478.972
## No. of selected breakpoints: 3
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2474.5381 2174.4660 641.1900 -194.4778 -288.3294 -314.8408
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1823.2457 1646.6961 273.2282 -82.6773 -246.6227 -135.2775
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 996.42428 906.74877 -41.36237 -56.11420 -836.13649 -33.52033
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 3616.5770 3026.1138 191.6586 -490.6517 -606.2570 -552.4531
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1151.1155 1029.9602 225.0398 209.2788 221.0290 -775.2246
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 929.04912 838.08368 65.31832 65.74085 -17.02793 -839.70432
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 811.74474 737.74512 737.71833 51.49536 -1065.15146 -1060.25635
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2022.5389 1833.5710 305.4828 298.6070 -1103.4756 -1096.1259
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 570.79356 486.39334 18.91778 17.35034 -887.50450 -875.95990
## No. of selected breakpoints: 4
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2994.43738 2565.55541 605.14976 508.48020 298.84247 41.62799
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 1162.58946 1012.74993 44.25341 21.26103 -1082.87466 -1159.32252
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2835.1940 2482.2167 480.5125 485.0955 -1090.9989 -1438.6539
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 962.29790 886.89491 890.92238 -90.34423 -1268.11459 -1288.25494
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2534.3008 2418.1016 261.0888 256.7558 267.8822 -1672.3627
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2323.3689 2098.5850 493.9556 -1084.3903 -1139.1639 -1464.1182
## No. of selected breakpoints: 5
## BIC to detect no. of breakpoints
## BIC values:
## 0 1 2 3 4 5
## 2786.1751 2448.6982 341.9478 -1073.6856 -1246.2982 -1196.1912
## No. of selected breakpoints: 4
This analysis yields several breakpoints per cast, and we must isolate the downcast manually.
# An example of the breakpoints fit
breakpointsSelect[[1]]$plot()
To do this, there are several filters that are used: these filters are preliminary and should be adjusted over time as this approach is adopted
Data in between this beginning and ending scan are isolated as the downcast.
downcastBreakpoints <- lapply(breakpointsSelect,
function(x) {
x$data %>%
mutate(breakpoints = ifelse(row_number() %in% round(x$breakpoints),
scan, NA),
maxDepth = ifelse(depth == max(depth, na.rm = T), scan, NA),
fromMax = scan - max(maxDepth, na.rm = T),
slope = depth - lead(depth, 15),
bookEndDowncast = ifelse(!is.na(maxDepth) | (!is.na(breakpoints) & fromMax < -10 & slope < -1) &
scan <= first(which(depth == max(depth, na.rm = T))),
T, F)) %>%
filter(between(scan, first(which(bookEndDowncast == T)), last(which(bookEndDowncast == T))))
})
We can quicky check to see if each cast has correctly isolated only the downcast (increasing depth):
downcastBreakpoints %>%
bind_rows(.id = "cast") %>%
ggplot(aes(scan, pressure)) +
geom_line() +
facet_wrap(~cast, scales = "free")
This requires an input file that lists the date, cast, time, and station taken from the seabird datasheet. This information is joined to the seabird data to assign the station to the downcast data.
stationCast <- read.csv(file.path("SeabirdLogJan-Mar2023.csv")) %>%
group_by(Station) %>%
transmute(startTimeLog = as.POSIXct(paste0(Date, " ", ifelse(nchar(Time) == 3, paste0("0", Time), Time)),
format = "%m/%d/%Y %H%M"),
month = format(as.Date(Date, format = "%m/%d/%Y"), format = "%m"),
cast = Cast, Station, rep = 1:n(), Depth, Comments) %>%
right_join(downcastBreakpoints %>%
bind_rows(.id = "fileName") %>%
mutate(cast = as.integer(cast)),
by = c("month", "cast")) %>%
group_by(cast = factor(cast), Station = factor(Station)) %>%
mutate(timeDifference = as.numeric(difftime(startTime, startTimeLog, units = "mins")))
Although the recorded cast number is generally correct, we can use the difference between start time recorded on the datasheet against on the seabird to make sure that we have correct matches.
distinct(stationCast, cast, rep, Station, timeDifference) %>%
ggplot(aes(factor(Station), timeDifference, color = factor(rep, levels = 1:3))) +
geom_point(size = 5) +
expand_limits(y = 0) +
labs(x = "Station", y = "Time Difference (min)", color = "Rep") +
theme_bw(base_size = 24) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))